Skip to content

Conversation

@yongtang
Copy link
Member

@yongtang yongtang commented Feb 7, 2021

This PR patchs arrow to temporarily resolve the ARROW-11518 issue.

See #1281 for details. (credit to @diggerk).

We will update arrow after the upstream PR is merged.

Signed-off-by: Yong Tang [email protected]

This PR patchs arrow to temporarily resolve the ARROW-11518 issue.

See 1281 for details

Credit to diggerk.

We will update arrow after the upstream PR is merged.

Signed-off-by: Yong Tang <[email protected]>
@yongtang yongtang merged commit 22eddcb into tensorflow:master Feb 8, 2021
@yongtang yongtang deleted the arrow-patch branch February 8, 2021 02:08
i-ony pushed a commit to i-ony/io that referenced this pull request Mar 8, 2021
…1304)

This PR patchs arrow to temporarily resolve the ARROW-11518 issue.

See 1281 for details

Credit to diggerk.

We will update arrow after the upstream PR is merged.

Signed-off-by: Yong Tang <[email protected]>
i-ony pushed a commit to i-ony/io that referenced this pull request Mar 15, 2021
…1304)

This PR patchs arrow to temporarily resolve the ARROW-11518 issue.

See 1281 for details

Credit to diggerk.

We will update arrow after the upstream PR is merged.

Signed-off-by: Yong Tang <[email protected]>
yongtang added a commit that referenced this pull request Mar 18, 2021
…he parsing time (#1283)

* Exposes num_parallel_reads and num_parallel_calls

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset`
-Adds parameter constraints
-Fixes lint issues
-Adds test method for _require() function
-This update adds a test to check if ValueErrors
are raised when given an invalid input for num_parallel_calls

* Bump Apache Arrow to 2.0.0 (#1231)

* Bump Apache Arrow to 2.0.0

Also bumps Apache Thrift to 0.13.0

Signed-off-by: Yong Tang <[email protected]>

* Update code to match Arrow

Signed-off-by: Yong Tang <[email protected]>

* Bump pyarrow to 2.0.0

Signed-off-by: Yong Tang <[email protected]>

* Stay with version=1 for write_feather to pass tests

Signed-off-by: Yong Tang <[email protected]>

* Bump flatbuffers to 1.12.0

Signed-off-by: Yong Tang <[email protected]>

* Fix Windows issue

Signed-off-by: Yong Tang <[email protected]>

* Fix tests

Signed-off-by: Yong Tang <[email protected]>

* Fix Windows

Signed-off-by: Yong Tang <[email protected]>

* Remove -std=c++11 and leave default -std=c++14 for arrow build

Signed-off-by: Yong Tang <[email protected]>

* Update sha256 of libapr1

As the hash changed by the repo.

Signed-off-by: Yong Tang <[email protected]>

* Add emulator for gcs (#1234)

* Bump com_github_googleapis_google_cloud_cpp to `1.21.0`

* Add gcs testbench

* Bump `libcurl` to `7.69.1`

* Remove the CI build for CentOS 8 (#1237)

Building shared libraries on CentOS 8 is pretty much the same as
on Ubuntu 20.04 except `apt` should be changed to `yum`. For that
our CentOS 8 CI test is not adding a lot of value.

Furthermore with the upcoming CentOS 8 change:
https://www.phoronix.com/scan.php?page=news_item&px=CentOS-8-Ending-For-Stream

CentOS 8 is effectively EOLed at 2021.

For that we may want to drop the CentOS 8 build (only leave a comment in README.md)

Note we keep CentOS 7 build for now as there are still many users using
CentOS 7 and CentOS 7 will only be EOLed at 2024. We might drop CentOS 7 build in
the future as well if there is similiar changes to CentOS 7 like CentOS 8.

Signed-off-by: Yong Tang <[email protected]>

* add tf-c-header rule (#1244)

* Skip  tf-nightly:tensorflow-io==0.17.0 on API compatibility test (#1247)

Signed-off-by: Yong Tang <[email protected]>

* [s3] add support for testing on macOS (#1253)

* [s3] add support for testing on macOS

* modify docker-compose cmd

* add notebook formatting instruction in README (#1256)

* [docs] Restructure README.md content (#1257)

* Refactor README.md content

* bump to run ci jobs

* Update libtiff/libgeotiff dependency (#1258)

This PR updates libtiff/libgeotiff to the latest version.

Signed-off-by: Yong Tang <[email protected]>

* remove unstable elasticsearch test setup on macOS (#1263)

* Exposes num_parallel_reads and num_parallel_calls (#1232)

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset`
-Adds parameter constraints
-Fixes lint issues
- Adds test method for _require() function
-This update adds a test to check if ValueErrors
are raised when given an invalid input for num_parallel_calls

Co-authored-by: Abin Shahab <[email protected]>

* Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches

Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches. This is recommended to be set equal to the vcore request.

* Exposes num_parallel_reads and num_parallel_calls (#1232)

* Exposes num_parallel_reads and num_parallel_calls

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset`
-Adds parameter constraints
-Fixes lint issues

* Exposes num_parallel_reads and num_parallel_calls

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset`
-Adds parameter constraints
-Fixes lint issues

* Exposes num_parallel_reads and num_parallel_calls

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset`
-Adds parameter constraints
-Fixes lint issues

* Fixes Lint Issues

* Removes Optional typing for method parameter

-

* Adds test method for _require() function

-This update adds a test to check if ValueErrors
are raised when given an invalid input for num_parallel_calls

* Uncomments skip for macOS pytests

* Fixes Lint issues

Co-authored-by: Abin Shahab <[email protected]>

* add avro tutorial testing data (#1267)

Co-authored-by: Cheng Ren <[email protected]>

* Update Kafka tutorial to work with Apache Kafka (#1266)

* Update Kafka tutorial to work with Apache Kafka

Minor update to the Kafka tutorial to remove the dependency on
Confluent's distribution of Kafka, and instead work with vanilla
Apache Kafka.

Signed-off-by: Dale Lane <[email protected]>

* Address review comments

Remove redundant pip install commands

Signed-off-by: Dale Lane <[email protected]>

* add github workflow for performance benchmarking (#1269)

* add github workflow for performance benchmarking

* add github-action-benchmark step

* handle missing dependencies while benchmarking (#1271)

* handle missing dependencies while benchmarking

* setup test_sql

* job name change

* set auto-push to true

* remove auto-push

* add personal access token

* use alternate method to push to gh-pages

* add name to the action

* use different id

* modify creds

* use github_token

* change repo name

* set auto-push

* set origin and push results

* set env

* use PERSONAL_GITHUB_TOKEN

* use push changes action

* use github.head_ref to push the changes

* try using fetch-depth

* modify branch name

* use alternative push approach

* git switch -

* test by merging with forked master

* Disable s3 macOS for now as docker is not working on GitHub Actions for macOS (#1277)

* Revert "[s3] add support for testing on macOS (#1253)"

This reverts commit 81789bd.

Signed-off-by: Yong Tang <[email protected]>

* Update

Signed-off-by: Yong Tang <[email protected]>

* rename testing data files (#1278)

* Add tutorial for avro dataset API (#1250)

* remove docker based mongodb tests in macos (#1279)

* trigger benchmarks workflow only on commits (#1282)

* Bump Apache Arrow to 3.0.0 (#1285)

Signed-off-by: Yong Tang <[email protected]>

* Add bazel cache (#1287)

Signed-off-by: Yong Tang <[email protected]>

* Add initial bigtable stub test (#1286)

* Add initial bigtable stub test

Signed-off-by: Yong Tang <[email protected]>

* Fix kokoro test

Signed-off-by: Yong Tang <[email protected]>

* Add reference to github-pages benchmarks in README (#1289)

* add reference to github-pages benchmarks

* minor grammar change

* Update README.md

Co-authored-by: Yuan Tang <[email protected]>

Co-authored-by: Yuan Tang <[email protected]>

* Clear outputs (#1292)

* fix kafka online-learning section in tutorial notebook (#1274)

* kafka notebook fix for colab env

* change timeout from 30 to 20 seconds

* reduce stream_timeout

* Only enable bazel caching writes for tensorflow/io github actions (#1293)

This PR updates so that only GitHub actions run on
tensorflow/io repo will be enabled with bazel cache writes.

Without the updates, a focked repo actions will cause error.

Note once bazel cache read-permissions are enabled from gcs
forked repo will be able to access bazel cache (read-only).

Signed-off-by: Yong Tang <[email protected]>

* Enable ready-only bazel cache (#1294)

This PR enables read-only bazel cache

Signed-off-by: Yong Tang <[email protected]>

* Rename tests (#1297)

* Combine Ubuntu 20.04 and CentOS 7 tests into one GitHub jobs (#1299)

When GitHub Actions runs it looks like there is an implicit concurrent
jobs limit. As such the CentOS 7 test normally is scheduled later after
other jobs completes. However, many times CentOS 7 test hangs
(e.g., https://github.com/tensorflow/io/runs/1825943449). This is likely
due to the CentOS 7 test is on the GitHub Actions queue for too long.

This PR moves CentOS 7 to run after Ubuntu 20.04 test complete, to try to
avoid hangs.

Signed-off-by: Yong Tang <[email protected]>

* Update names of api tests (#1300)

We renamed the tests to remove "_eager" parts. This PR updates the api test for correct filenames

Signed-off-by: Yong Tang <[email protected]>

* Fix wrong benchmark tests names (#1301)

Fixes wrong benchmark tests names caused by last commit

Signed-off-by: Yong Tang <[email protected]>

* Patch arrow to temporarily resolve the ARROW-11518 issue (#1304)

This PR patchs arrow to temporarily resolve the ARROW-11518 issue.

See 1281 for details

Credit to diggerk.

We will update arrow after the upstream PR is merged.

Signed-off-by: Yong Tang <[email protected]>

* Remove AWS headers from tensorflow, and use headers from third_party … (#1241)

* Remove external headers from tensorflow, and use third_party headers instead

This PR removes external headers from tensorflow, and
use third_party headers instead.

Signed-off-by: Yong Tang <[email protected]>

* Address review comment

Signed-off-by: Yong Tang <[email protected]>

* Switch to use github to download libgeotiff (#1307)

Signed-off-by: Yong Tang <[email protected]>

* Add @com_google_absl//absl/strings:cord (#1308)

Fix read/STDIN_FILENO

Signed-off-by: Yong Tang <[email protected]>

* Switch to modular file system for hdfs (#1309)

* Switch to modular file system for hdfs

This PR is part of the effort to switch to modular file system for hdfs.
When TF_ENABLE_LEGACY_FILESYSTEM=1 is provided, old behavior will
be preserved.

Signed-off-by: Yong Tang <[email protected]>

* Build against tf-nightly

Signed-off-by: Yong Tang <[email protected]>

* Update tests

Signed-off-by: Yong Tang <[email protected]>

* Adjust the if else logic, follow review comment

Signed-off-by: Yong Tang <[email protected]>

* Disable test_write_kafka test for now. (#1310)

With tensorflow upgrade to tf-nightly, the test_write_kafka test
is failing and that is block the plan to modular file system migration.

This PR disables the test temporarily so that CI can continue
to push tensorflow-io-nightly image (needed for modular file system migration)

Signed-off-by: Yong Tang <[email protected]>

* Switch to modular file system for s3 (#1312)

This PR is part of the effort to switch to modular file system for s3.
When TF_ENABLE_LEGACY_FILESYSTEM=1 is provided, old behavior will
be preserved.

Signed-off-by: Yong Tang <[email protected]>

* Add python 3.9 on Windows (#1316)

* Updates the PR to use attribute instead of Env Variable

-Originally AVRO_PARSER_NUM_MINIBATCH was set as an environmental
variable.  Because tensorflow-io rarely uses env vars to fine tune
kernal ops this was changed to an attribute. See comment here:
#1283 (comment)

* Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches

Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches. This is recommended to be set equal to the vcore request.

* Updates the PR to use attribute instead of Env Variable

-Originally AVRO_PARSER_NUM_MINIBATCH was set as an environmental
variable.  Because tensorflow-io rarely uses env vars to fine tune
kernal ops this was changed to an attribute. See comment here:
#1283 (comment)

* Adds addtional comments in source code for understandability

Co-authored-by: Abin Shahab <[email protected]>
Co-authored-by: Yong Tang <[email protected]>
Co-authored-by: Vo Van Nghia <[email protected]>
Co-authored-by: Vignesh Kothapalli <[email protected]>
Co-authored-by: Cheng Ren <[email protected]>
Co-authored-by: Cheng Ren <[email protected]>
Co-authored-by: Dale Lane <[email protected]>
Co-authored-by: Yuan Tang <[email protected]>
Co-authored-by: Mark Daoust <[email protected]>
michaelbanfield pushed a commit to michaelbanfield/io that referenced this pull request Mar 30, 2021
…1304)

This PR patchs arrow to temporarily resolve the ARROW-11518 issue.

See 1281 for details

Credit to diggerk.

We will update arrow after the upstream PR is merged.

Signed-off-by: Yong Tang <[email protected]>
michaelbanfield pushed a commit to michaelbanfield/io that referenced this pull request Mar 30, 2021
…he parsing time (tensorflow#1283)

* Exposes num_parallel_reads and num_parallel_calls

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset`
-Adds parameter constraints
-Fixes lint issues
-Adds test method for _require() function
-This update adds a test to check if ValueErrors
are raised when given an invalid input for num_parallel_calls

* Bump Apache Arrow to 2.0.0 (tensorflow#1231)

* Bump Apache Arrow to 2.0.0

Also bumps Apache Thrift to 0.13.0

Signed-off-by: Yong Tang <[email protected]>

* Update code to match Arrow

Signed-off-by: Yong Tang <[email protected]>

* Bump pyarrow to 2.0.0

Signed-off-by: Yong Tang <[email protected]>

* Stay with version=1 for write_feather to pass tests

Signed-off-by: Yong Tang <[email protected]>

* Bump flatbuffers to 1.12.0

Signed-off-by: Yong Tang <[email protected]>

* Fix Windows issue

Signed-off-by: Yong Tang <[email protected]>

* Fix tests

Signed-off-by: Yong Tang <[email protected]>

* Fix Windows

Signed-off-by: Yong Tang <[email protected]>

* Remove -std=c++11 and leave default -std=c++14 for arrow build

Signed-off-by: Yong Tang <[email protected]>

* Update sha256 of libapr1

As the hash changed by the repo.

Signed-off-by: Yong Tang <[email protected]>

* Add emulator for gcs (tensorflow#1234)

* Bump com_github_googleapis_google_cloud_cpp to `1.21.0`

* Add gcs testbench

* Bump `libcurl` to `7.69.1`

* Remove the CI build for CentOS 8 (tensorflow#1237)

Building shared libraries on CentOS 8 is pretty much the same as
on Ubuntu 20.04 except `apt` should be changed to `yum`. For that
our CentOS 8 CI test is not adding a lot of value.

Furthermore with the upcoming CentOS 8 change:
https://www.phoronix.com/scan.php?page=news_item&px=CentOS-8-Ending-For-Stream

CentOS 8 is effectively EOLed at 2021.

For that we may want to drop the CentOS 8 build (only leave a comment in README.md)

Note we keep CentOS 7 build for now as there are still many users using
CentOS 7 and CentOS 7 will only be EOLed at 2024. We might drop CentOS 7 build in
the future as well if there is similiar changes to CentOS 7 like CentOS 8.

Signed-off-by: Yong Tang <[email protected]>

* add tf-c-header rule (tensorflow#1244)

* Skip  tf-nightly:tensorflow-io==0.17.0 on API compatibility test (tensorflow#1247)

Signed-off-by: Yong Tang <[email protected]>

* [s3] add support for testing on macOS (tensorflow#1253)

* [s3] add support for testing on macOS

* modify docker-compose cmd

* add notebook formatting instruction in README (tensorflow#1256)

* [docs] Restructure README.md content (tensorflow#1257)

* Refactor README.md content

* bump to run ci jobs

* Update libtiff/libgeotiff dependency (tensorflow#1258)

This PR updates libtiff/libgeotiff to the latest version.

Signed-off-by: Yong Tang <[email protected]>

* remove unstable elasticsearch test setup on macOS (tensorflow#1263)

* Exposes num_parallel_reads and num_parallel_calls (tensorflow#1232)

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset`
-Adds parameter constraints
-Fixes lint issues
- Adds test method for _require() function
-This update adds a test to check if ValueErrors
are raised when given an invalid input for num_parallel_calls

Co-authored-by: Abin Shahab <[email protected]>

* Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches

Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches. This is recommended to be set equal to the vcore request.

* Exposes num_parallel_reads and num_parallel_calls (tensorflow#1232)

* Exposes num_parallel_reads and num_parallel_calls

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset`
-Adds parameter constraints
-Fixes lint issues

* Exposes num_parallel_reads and num_parallel_calls

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset`
-Adds parameter constraints
-Fixes lint issues

* Exposes num_parallel_reads and num_parallel_calls

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset`
-Adds parameter constraints
-Fixes lint issues

* Fixes Lint Issues

* Removes Optional typing for method parameter

-

* Adds test method for _require() function

-This update adds a test to check if ValueErrors
are raised when given an invalid input for num_parallel_calls

* Uncomments skip for macOS pytests

* Fixes Lint issues

Co-authored-by: Abin Shahab <[email protected]>

* add avro tutorial testing data (tensorflow#1267)

Co-authored-by: Cheng Ren <[email protected]>

* Update Kafka tutorial to work with Apache Kafka (tensorflow#1266)

* Update Kafka tutorial to work with Apache Kafka

Minor update to the Kafka tutorial to remove the dependency on
Confluent's distribution of Kafka, and instead work with vanilla
Apache Kafka.

Signed-off-by: Dale Lane <[email protected]>

* Address review comments

Remove redundant pip install commands

Signed-off-by: Dale Lane <[email protected]>

* add github workflow for performance benchmarking (tensorflow#1269)

* add github workflow for performance benchmarking

* add github-action-benchmark step

* handle missing dependencies while benchmarking (tensorflow#1271)

* handle missing dependencies while benchmarking

* setup test_sql

* job name change

* set auto-push to true

* remove auto-push

* add personal access token

* use alternate method to push to gh-pages

* add name to the action

* use different id

* modify creds

* use github_token

* change repo name

* set auto-push

* set origin and push results

* set env

* use PERSONAL_GITHUB_TOKEN

* use push changes action

* use github.head_ref to push the changes

* try using fetch-depth

* modify branch name

* use alternative push approach

* git switch -

* test by merging with forked master

* Disable s3 macOS for now as docker is not working on GitHub Actions for macOS (tensorflow#1277)

* Revert "[s3] add support for testing on macOS (tensorflow#1253)"

This reverts commit 81789bd.

Signed-off-by: Yong Tang <[email protected]>

* Update

Signed-off-by: Yong Tang <[email protected]>

* rename testing data files (tensorflow#1278)

* Add tutorial for avro dataset API (tensorflow#1250)

* remove docker based mongodb tests in macos (tensorflow#1279)

* trigger benchmarks workflow only on commits (tensorflow#1282)

* Bump Apache Arrow to 3.0.0 (tensorflow#1285)

Signed-off-by: Yong Tang <[email protected]>

* Add bazel cache (tensorflow#1287)

Signed-off-by: Yong Tang <[email protected]>

* Add initial bigtable stub test (tensorflow#1286)

* Add initial bigtable stub test

Signed-off-by: Yong Tang <[email protected]>

* Fix kokoro test

Signed-off-by: Yong Tang <[email protected]>

* Add reference to github-pages benchmarks in README (tensorflow#1289)

* add reference to github-pages benchmarks

* minor grammar change

* Update README.md

Co-authored-by: Yuan Tang <[email protected]>

Co-authored-by: Yuan Tang <[email protected]>

* Clear outputs (tensorflow#1292)

* fix kafka online-learning section in tutorial notebook (tensorflow#1274)

* kafka notebook fix for colab env

* change timeout from 30 to 20 seconds

* reduce stream_timeout

* Only enable bazel caching writes for tensorflow/io github actions (tensorflow#1293)

This PR updates so that only GitHub actions run on
tensorflow/io repo will be enabled with bazel cache writes.

Without the updates, a focked repo actions will cause error.

Note once bazel cache read-permissions are enabled from gcs
forked repo will be able to access bazel cache (read-only).

Signed-off-by: Yong Tang <[email protected]>

* Enable ready-only bazel cache (tensorflow#1294)

This PR enables read-only bazel cache

Signed-off-by: Yong Tang <[email protected]>

* Rename tests (tensorflow#1297)

* Combine Ubuntu 20.04 and CentOS 7 tests into one GitHub jobs (tensorflow#1299)

When GitHub Actions runs it looks like there is an implicit concurrent
jobs limit. As such the CentOS 7 test normally is scheduled later after
other jobs completes. However, many times CentOS 7 test hangs
(e.g., https://github.com/tensorflow/io/runs/1825943449). This is likely
due to the CentOS 7 test is on the GitHub Actions queue for too long.

This PR moves CentOS 7 to run after Ubuntu 20.04 test complete, to try to
avoid hangs.

Signed-off-by: Yong Tang <[email protected]>

* Update names of api tests (tensorflow#1300)

We renamed the tests to remove "_eager" parts. This PR updates the api test for correct filenames

Signed-off-by: Yong Tang <[email protected]>

* Fix wrong benchmark tests names (tensorflow#1301)

Fixes wrong benchmark tests names caused by last commit

Signed-off-by: Yong Tang <[email protected]>

* Patch arrow to temporarily resolve the ARROW-11518 issue (tensorflow#1304)

This PR patchs arrow to temporarily resolve the ARROW-11518 issue.

See 1281 for details

Credit to diggerk.

We will update arrow after the upstream PR is merged.

Signed-off-by: Yong Tang <[email protected]>

* Remove AWS headers from tensorflow, and use headers from third_party … (tensorflow#1241)

* Remove external headers from tensorflow, and use third_party headers instead

This PR removes external headers from tensorflow, and
use third_party headers instead.

Signed-off-by: Yong Tang <[email protected]>

* Address review comment

Signed-off-by: Yong Tang <[email protected]>

* Switch to use github to download libgeotiff (tensorflow#1307)

Signed-off-by: Yong Tang <[email protected]>

* Add @com_google_absl//absl/strings:cord (tensorflow#1308)

Fix read/STDIN_FILENO

Signed-off-by: Yong Tang <[email protected]>

* Switch to modular file system for hdfs (tensorflow#1309)

* Switch to modular file system for hdfs

This PR is part of the effort to switch to modular file system for hdfs.
When TF_ENABLE_LEGACY_FILESYSTEM=1 is provided, old behavior will
be preserved.

Signed-off-by: Yong Tang <[email protected]>

* Build against tf-nightly

Signed-off-by: Yong Tang <[email protected]>

* Update tests

Signed-off-by: Yong Tang <[email protected]>

* Adjust the if else logic, follow review comment

Signed-off-by: Yong Tang <[email protected]>

* Disable test_write_kafka test for now. (tensorflow#1310)

With tensorflow upgrade to tf-nightly, the test_write_kafka test
is failing and that is block the plan to modular file system migration.

This PR disables the test temporarily so that CI can continue
to push tensorflow-io-nightly image (needed for modular file system migration)

Signed-off-by: Yong Tang <[email protected]>

* Switch to modular file system for s3 (tensorflow#1312)

This PR is part of the effort to switch to modular file system for s3.
When TF_ENABLE_LEGACY_FILESYSTEM=1 is provided, old behavior will
be preserved.

Signed-off-by: Yong Tang <[email protected]>

* Add python 3.9 on Windows (tensorflow#1316)

* Updates the PR to use attribute instead of Env Variable

-Originally AVRO_PARSER_NUM_MINIBATCH was set as an environmental
variable.  Because tensorflow-io rarely uses env vars to fine tune
kernal ops this was changed to an attribute. See comment here:
tensorflow#1283 (comment)

* Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches

Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches. This is recommended to be set equal to the vcore request.

* Updates the PR to use attribute instead of Env Variable

-Originally AVRO_PARSER_NUM_MINIBATCH was set as an environmental
variable.  Because tensorflow-io rarely uses env vars to fine tune
kernal ops this was changed to an attribute. See comment here:
tensorflow#1283 (comment)

* Adds addtional comments in source code for understandability

Co-authored-by: Abin Shahab <[email protected]>
Co-authored-by: Yong Tang <[email protected]>
Co-authored-by: Vo Van Nghia <[email protected]>
Co-authored-by: Vignesh Kothapalli <[email protected]>
Co-authored-by: Cheng Ren <[email protected]>
Co-authored-by: Cheng Ren <[email protected]>
Co-authored-by: Dale Lane <[email protected]>
Co-authored-by: Yuan Tang <[email protected]>
Co-authored-by: Mark Daoust <[email protected]>
zheolong pushed a commit to zheolong/io-1 that referenced this pull request Jul 24, 2025
…1304)

This PR patchs arrow to temporarily resolve the ARROW-11518 issue.

See 1281 for details

Credit to diggerk.

We will update arrow after the upstream PR is merged.

Signed-off-by: Yong Tang <[email protected]>
zheolong pushed a commit to zheolong/io-1 that referenced this pull request Jul 24, 2025
…he parsing time (tensorflow#1283)

* Exposes num_parallel_reads and num_parallel_calls

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset`
-Adds parameter constraints
-Fixes lint issues
-Adds test method for _require() function
-This update adds a test to check if ValueErrors
are raised when given an invalid input for num_parallel_calls

* Bump Apache Arrow to 2.0.0 (tensorflow#1231)

* Bump Apache Arrow to 2.0.0

Also bumps Apache Thrift to 0.13.0

Signed-off-by: Yong Tang <[email protected]>

* Update code to match Arrow

Signed-off-by: Yong Tang <[email protected]>

* Bump pyarrow to 2.0.0

Signed-off-by: Yong Tang <[email protected]>

* Stay with version=1 for write_feather to pass tests

Signed-off-by: Yong Tang <[email protected]>

* Bump flatbuffers to 1.12.0

Signed-off-by: Yong Tang <[email protected]>

* Fix Windows issue

Signed-off-by: Yong Tang <[email protected]>

* Fix tests

Signed-off-by: Yong Tang <[email protected]>

* Fix Windows

Signed-off-by: Yong Tang <[email protected]>

* Remove -std=c++11 and leave default -std=c++14 for arrow build

Signed-off-by: Yong Tang <[email protected]>

* Update sha256 of libapr1

As the hash changed by the repo.

Signed-off-by: Yong Tang <[email protected]>

* Add emulator for gcs (tensorflow#1234)

* Bump com_github_googleapis_google_cloud_cpp to `1.21.0`

* Add gcs testbench

* Bump `libcurl` to `7.69.1`

* Remove the CI build for CentOS 8 (tensorflow#1237)

Building shared libraries on CentOS 8 is pretty much the same as
on Ubuntu 20.04 except `apt` should be changed to `yum`. For that
our CentOS 8 CI test is not adding a lot of value.

Furthermore with the upcoming CentOS 8 change:
https://www.phoronix.com/scan.php?page=news_item&px=CentOS-8-Ending-For-Stream

CentOS 8 is effectively EOLed at 2021.

For that we may want to drop the CentOS 8 build (only leave a comment in README.md)

Note we keep CentOS 7 build for now as there are still many users using
CentOS 7 and CentOS 7 will only be EOLed at 2024. We might drop CentOS 7 build in
the future as well if there is similiar changes to CentOS 7 like CentOS 8.

Signed-off-by: Yong Tang <[email protected]>

* add tf-c-header rule (tensorflow#1244)

* Skip  tf-nightly:tensorflow-io==0.17.0 on API compatibility test (tensorflow#1247)

Signed-off-by: Yong Tang <[email protected]>

* [s3] add support for testing on macOS (tensorflow#1253)

* [s3] add support for testing on macOS

* modify docker-compose cmd

* add notebook formatting instruction in README (tensorflow#1256)

* [docs] Restructure README.md content (tensorflow#1257)

* Refactor README.md content

* bump to run ci jobs

* Update libtiff/libgeotiff dependency (tensorflow#1258)

This PR updates libtiff/libgeotiff to the latest version.

Signed-off-by: Yong Tang <[email protected]>

* remove unstable elasticsearch test setup on macOS (tensorflow#1263)

* Exposes num_parallel_reads and num_parallel_calls (tensorflow#1232)

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset`
-Adds parameter constraints
-Fixes lint issues
- Adds test method for _require() function
-This update adds a test to check if ValueErrors
are raised when given an invalid input for num_parallel_calls

Co-authored-by: Abin Shahab <[email protected]>

* Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches

Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches. This is recommended to be set equal to the vcore request.

* Exposes num_parallel_reads and num_parallel_calls (tensorflow#1232)

* Exposes num_parallel_reads and num_parallel_calls

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset`
-Adds parameter constraints
-Fixes lint issues

* Exposes num_parallel_reads and num_parallel_calls

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset`
-Adds parameter constraints
-Fixes lint issues

* Exposes num_parallel_reads and num_parallel_calls

-Exposes `num_parallel_reads` and `num_parallel_calls` in AvroRecordDataset and `make_avro_record_dataset`
-Adds parameter constraints
-Fixes lint issues

* Fixes Lint Issues

* Removes Optional typing for method parameter

-

* Adds test method for _require() function

-This update adds a test to check if ValueErrors
are raised when given an invalid input for num_parallel_calls

* Uncomments skip for macOS pytests

* Fixes Lint issues

Co-authored-by: Abin Shahab <[email protected]>

* add avro tutorial testing data (tensorflow#1267)

Co-authored-by: Cheng Ren <[email protected]>

* Update Kafka tutorial to work with Apache Kafka (tensorflow#1266)

* Update Kafka tutorial to work with Apache Kafka

Minor update to the Kafka tutorial to remove the dependency on
Confluent's distribution of Kafka, and instead work with vanilla
Apache Kafka.

Signed-off-by: Dale Lane <[email protected]>

* Address review comments

Remove redundant pip install commands

Signed-off-by: Dale Lane <[email protected]>

* add github workflow for performance benchmarking (tensorflow#1269)

* add github workflow for performance benchmarking

* add github-action-benchmark step

* handle missing dependencies while benchmarking (tensorflow#1271)

* handle missing dependencies while benchmarking

* setup test_sql

* job name change

* set auto-push to true

* remove auto-push

* add personal access token

* use alternate method to push to gh-pages

* add name to the action

* use different id

* modify creds

* use github_token

* change repo name

* set auto-push

* set origin and push results

* set env

* use PERSONAL_GITHUB_TOKEN

* use push changes action

* use github.head_ref to push the changes

* try using fetch-depth

* modify branch name

* use alternative push approach

* git switch -

* test by merging with forked master

* Disable s3 macOS for now as docker is not working on GitHub Actions for macOS (tensorflow#1277)

* Revert "[s3] add support for testing on macOS (tensorflow#1253)"

This reverts commit bced582.

Signed-off-by: Yong Tang <[email protected]>

* Update

Signed-off-by: Yong Tang <[email protected]>

* rename testing data files (tensorflow#1278)

* Add tutorial for avro dataset API (tensorflow#1250)

* remove docker based mongodb tests in macos (tensorflow#1279)

* trigger benchmarks workflow only on commits (tensorflow#1282)

* Bump Apache Arrow to 3.0.0 (tensorflow#1285)

Signed-off-by: Yong Tang <[email protected]>

* Add bazel cache (tensorflow#1287)

Signed-off-by: Yong Tang <[email protected]>

* Add initial bigtable stub test (tensorflow#1286)

* Add initial bigtable stub test

Signed-off-by: Yong Tang <[email protected]>

* Fix kokoro test

Signed-off-by: Yong Tang <[email protected]>

* Add reference to github-pages benchmarks in README (tensorflow#1289)

* add reference to github-pages benchmarks

* minor grammar change

* Update README.md

Co-authored-by: Yuan Tang <[email protected]>

Co-authored-by: Yuan Tang <[email protected]>

* Clear outputs (tensorflow#1292)

* fix kafka online-learning section in tutorial notebook (tensorflow#1274)

* kafka notebook fix for colab env

* change timeout from 30 to 20 seconds

* reduce stream_timeout

* Only enable bazel caching writes for tensorflow/io github actions (tensorflow#1293)

This PR updates so that only GitHub actions run on
tensorflow/io repo will be enabled with bazel cache writes.

Without the updates, a focked repo actions will cause error.

Note once bazel cache read-permissions are enabled from gcs
forked repo will be able to access bazel cache (read-only).

Signed-off-by: Yong Tang <[email protected]>

* Enable ready-only bazel cache (tensorflow#1294)

This PR enables read-only bazel cache

Signed-off-by: Yong Tang <[email protected]>

* Rename tests (tensorflow#1297)

* Combine Ubuntu 20.04 and CentOS 7 tests into one GitHub jobs (tensorflow#1299)

When GitHub Actions runs it looks like there is an implicit concurrent
jobs limit. As such the CentOS 7 test normally is scheduled later after
other jobs completes. However, many times CentOS 7 test hangs
(e.g., https://github.com/tensorflow/io/runs/1825943449). This is likely
due to the CentOS 7 test is on the GitHub Actions queue for too long.

This PR moves CentOS 7 to run after Ubuntu 20.04 test complete, to try to
avoid hangs.

Signed-off-by: Yong Tang <[email protected]>

* Update names of api tests (tensorflow#1300)

We renamed the tests to remove "_eager" parts. This PR updates the api test for correct filenames

Signed-off-by: Yong Tang <[email protected]>

* Fix wrong benchmark tests names (tensorflow#1301)

Fixes wrong benchmark tests names caused by last commit

Signed-off-by: Yong Tang <[email protected]>

* Patch arrow to temporarily resolve the ARROW-11518 issue (tensorflow#1304)

This PR patchs arrow to temporarily resolve the ARROW-11518 issue.

See 1281 for details

Credit to diggerk.

We will update arrow after the upstream PR is merged.

Signed-off-by: Yong Tang <[email protected]>

* Remove AWS headers from tensorflow, and use headers from third_party … (tensorflow#1241)

* Remove external headers from tensorflow, and use third_party headers instead

This PR removes external headers from tensorflow, and
use third_party headers instead.

Signed-off-by: Yong Tang <[email protected]>

* Address review comment

Signed-off-by: Yong Tang <[email protected]>

* Switch to use github to download libgeotiff (tensorflow#1307)

Signed-off-by: Yong Tang <[email protected]>

* Add @com_google_absl//absl/strings:cord (tensorflow#1308)

Fix read/STDIN_FILENO

Signed-off-by: Yong Tang <[email protected]>

* Switch to modular file system for hdfs (tensorflow#1309)

* Switch to modular file system for hdfs

This PR is part of the effort to switch to modular file system for hdfs.
When TF_ENABLE_LEGACY_FILESYSTEM=1 is provided, old behavior will
be preserved.

Signed-off-by: Yong Tang <[email protected]>

* Build against tf-nightly

Signed-off-by: Yong Tang <[email protected]>

* Update tests

Signed-off-by: Yong Tang <[email protected]>

* Adjust the if else logic, follow review comment

Signed-off-by: Yong Tang <[email protected]>

* Disable test_write_kafka test for now. (tensorflow#1310)

With tensorflow upgrade to tf-nightly, the test_write_kafka test
is failing and that is block the plan to modular file system migration.

This PR disables the test temporarily so that CI can continue
to push tensorflow-io-nightly image (needed for modular file system migration)

Signed-off-by: Yong Tang <[email protected]>

* Switch to modular file system for s3 (tensorflow#1312)

This PR is part of the effort to switch to modular file system for s3.
When TF_ENABLE_LEGACY_FILESYSTEM=1 is provided, old behavior will
be preserved.

Signed-off-by: Yong Tang <[email protected]>

* Add python 3.9 on Windows (tensorflow#1316)

* Updates the PR to use attribute instead of Env Variable

-Originally AVRO_PARSER_NUM_MINIBATCH was set as an environmental
variable.  Because tensorflow-io rarely uses env vars to fine tune
kernal ops this was changed to an attribute. See comment here:
tensorflow#1283 (comment)

* Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches

Added AVRO_PARSER_NUM_MINIBATCH to override num_minibatches. This is recommended to be set equal to the vcore request.

* Updates the PR to use attribute instead of Env Variable

-Originally AVRO_PARSER_NUM_MINIBATCH was set as an environmental
variable.  Because tensorflow-io rarely uses env vars to fine tune
kernal ops this was changed to an attribute. See comment here:
tensorflow#1283 (comment)

* Adds addtional comments in source code for understandability

Co-authored-by: Abin Shahab <[email protected]>
Co-authored-by: Yong Tang <[email protected]>
Co-authored-by: Vo Van Nghia <[email protected]>
Co-authored-by: Vignesh Kothapalli <[email protected]>
Co-authored-by: Cheng Ren <[email protected]>
Co-authored-by: Cheng Ren <[email protected]>
Co-authored-by: Dale Lane <[email protected]>
Co-authored-by: Yuan Tang <[email protected]>
Co-authored-by: Mark Daoust <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants